After checking the most prevalent & abundant (appeared in more than 50 populations and average allele frequency > 0.2) variants, we also analyzed the variants detected in each compound evolved populations. This is to not miss the variants that are specific to compound (therefore not prevalent) and have varied allel frequency (therefore average <= 0.2 among all detected populations).
We filtered out variants around the following gene (as they were appearing in many compounds and discussed in gene wise variant analysis): peg.962, detected in ALE 2.0, discussed in gene wise analysis; peg.1114 (in fact closer to peg.1113 lipoprotein); peg.1351; peg.1554 (detected in parental and control samples with AF == 1); peg.1416, peg.1417; peg.1537; peg.1554, and peg.1555.
We extracted variants in ALE 2.0 and compared them to the PFNA and PFOA in ALE 1.0 (no filter for AF, i.e. all variants with AF > 0.05; or gene).
Gene peg.319 is annotated as beta-glucosidase (one of the 17 copies in the genome; EC 3.2.1.21;Ontology_term=KEGG_ENZYME:3.2.1.21; Glycosyl hydrolase family 3 C-terminal domain protein OS=Bacteroides uniformis in UniProt).
This has been discussed in gene wise analysis (1200664 - 1203843, strand -).
We then extracted and plot the hot sport region, from 1,912 to 1,945 Kb. Five copies of SusC-SusD found in this region. Gene IDs in this region range from peg.1536 to peg.1563.
As we already analyzed peg.1537(actually suppose to be peg.1536), peg.1554 (detected in parental and control samples with AF == 1), we will filter these two gene out here.
As we can see, among five copies of TonB, the 2nd (peg.1544) and 5th (peg.1562) have more variants detected. To zoom in those two genes:
This gene (peg.1962, 2,402,232 to 2,403,446, + strand) is annotated as site-specific recombinase, belonging to thephage integrase family. Variants in non-coding region (at 2,403,658) seem to be located after this gene and before peg.1963 (Capsular polysaccharide transcription antitermination protein UpxY family, or NusG or KOW domain-containing protein; 2,403,943 to 2,404,491). NusG is an intrinsic transcription termination factor that stimulates motility and coordinates gene expression with NusA (peg.1227).
This has been discussed in gene wise analysis (3255197 to 3256387, strand -, hypothetical protein).
There are few mutations with AF > 0.5 after 3,500 Kb, however, non of those has replicates. Among which, we have plotted peg.3135 and peg.3342, will be skip here.
Firstly, we checked the overall distribution of variants in compound and control populations.
Among those variants, we decided to focus on the areas where AF > 0.2 variants were detected, including coding area peg.1417, peg.1562, peg.1703, peg.1704, peg.1976, peg.2523, peg.2890. There are two non-coding region variants with AF ~ 0.18 around gene peg.393 (Two-component system sensor histidine kinase) and peg.538 (Outer membrane TonB-dependent transporter utilization system for glycans and polysaccharides (PUL) SusC family), however, no replicates were found in those region.
Not very interesting for Loperamide, as the AF in parental and control strains is even higher than in Loperamide evolved populations. This gene has been discussed in ALE 2.0 before, as the 5th TonB in 1,1912 Kb - 1, 945 Kb.
TrKAH units for K+ channel / transporter, might be related to the growth adaptation. Study showed that trkA mutation will depolarize the membrane comparing to wild type (Zhang et al., 2020). Since these two genes are located on - strand, therefore, most of the frame shift mutations were at the beginning of the protein coding sequences.
Variant (at position 2,420,681) locates after gene peg.1976 (Glyco_trans_1_4,Glyco_transf_4), but found in many samples -> probably not doing anything?
Synonymous variant in coding region of gene GDP-mannose 4,6-dehydratase.
Missense variant in coding region of gene putative type IIS restriction/modification enzyme.
Firstly, we checked the overall distribution of variants in compound and control populations.
Among those variants, we decided to focus on the areas where AF > 0.2 variants were detected, including coding area peg.1417, peg.1562, peg.1976. All those genes were discussed either in gene wise variant analysis or other compound wise variant analysis.
Firstly, we checked the overall distribution of variants in compound and control populations.
Among those variants, we decided to focus on the areas where AF > 0.2 variants were detected, including coding area peg.935, peg.1250, peg.1416, peg.1417, peg.1562, peg.2424, peg.2427, peg.2855.
Gene peg.935 (RHS repeat-associated core domain protein) is on - strand.
Genes peg.2423 (YjbH, -), peg.2424 (-), peg.2427 (YjbH, +, tr|R7EIH8|R7EIH8_9BACE, lipoprotein).
Gene peg.2855 (Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14);Ontology_term=KEGG_ENZYME:6.3.4.14) is on + strand.
We also get variants with AF > 0.1 and compared them between compounds -> PCoA of these variants? Rows are populations, columns are variants?